Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Text Conversation Task

نویسندگان

  • Kozo Chikai
  • Yuki Arase
چکیده

According to rise of social networking services, short text like micro-blogs has become a valuable resource for practical applications. When using text data in applications, similarity estimation between text is an important process. Conventional methods have assumed that an input text is sufficiently long such that we can rely on statistical approaches, e.g., counting word occurrences. However, micro-blogs are much shorter; for example, tweets posted to Twitter are restricted to have only 140 character long. This is critical for the conventional methods since they suffer from lack of reliable statistics from the text. In this study, we compare the state-of-the-art methods for estimating text similarities to investigate their performance in handling short text, specially, under the scenario of short text conversation. We implement a conversation system using a million tweets crawled from Twitter. Our system also employs supervised learning approach to decide if a tweet can be a reply to an input, which has been revealed effective as a result of the NTCIR-12 Short Text Conversation Task.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Nders at the NTCIR-12 STC Task: Ranking Response Messages with Mixed Similarity for Short Text Conversation

Short Text Conversation (STC) is a typical scenario in manmachine conversation, which simplifies the conversation into one round interaction and makes the related tasks more practical. This paper presents a simple approach to the Chinese STC task issued by NTCIR-12. Given a repository of post-comment pairs, for any query, we define three types of similarity and merged them according to empirica...

متن کامل

ICL00 at the NTCIR-12 STC Task: Semantic-based Retrieval Method of Short Texts

We take part in the short text conversation task at NTCIR-12. We employ a semantic-based retrieval method to tackle this problem, by calculating textual similarity between posts and comments. Our method applies a rich-feature model to match post-comment pairs, by using semantic, grammar, n-gram and string features to extract high-level semantic meanings of text.

متن کامل

Overview of the NTCIR-12 Short Text Conversation Task

We describe an overview of the NTCIR-12 Short Text Conversation (STC) task, which is a new pilot task of NTCIR-12. STC consists of two subtasks: a Chinese subtask using post-comment pairs crawled from Weibo, and a Japanese subtask providing the IDs of such pairs from Twitter. Thus, the main difference between the two subtasks lies in the sources and languages of the test collections. For the Ch...

متن کامل

UWNLP at the NTCIR-12 Short Text Conversation Task

In this paper, we describe our submission to the NTCIR12 Short Text Conversation task. We consider short text conversation as a community Question-Answering problem, hence we solve this task in three steps: First, we retrieve a set of candidate posts from a pre-built indexing service. Second, these candidate posts are ranked according to their similarity with the original input post. Finally, w...

متن کامل

A Combination of Similarity and Rule-based Method of PolyU for NTCIR-12 STC Task

In this report, we describe the approach we use in NTCIR-12 Short Text Conversation task. Because we register this task too late and we only have less than one week to do this task, we design a simple approach that is based on cosine similarity of sentence and some handcrafted rules. The result shows the effectiveness of our method.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016